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In eukaryotic genes the protein coding sequence is split into several fragments, the exons, separated 
by non-coding DNA stretches, the introns. Prokaryotes do not have introns in their genome. We 
report the calculations of stability domains of actin genes for various organisms in the animal, plant 
and fungi kingdoms. Actin genes have been chosen because they have been highly conserved during 
evolution. In these genes all introns were removed so as to mimic ancient genes at the time of the 
early eukaryotic development, i.e. before introns insertion. Common stability boundaries are found 
in evolutionary distant organisms, which implies that these boundaries date from the early origin 
of eukaryotes. In general boundaries correspond with introns positions of vertebrates and other 
animals actins, but not much for plants and fungi. The sharpest boundary is found in a locus where 
fungi, algae and animals have introns in positions separated by one nucleotide only, which identifies 
a hot-spot for insertion. These results suggest that some introns may have been incorporated into the 
genomes through a thermodynamic driven mechanism, in agreement with previous observations on 
human genes. They also suggest a different mechanism for introns insertion in plants and animals. 

PACS numbers: 87.15.-v,82.39.Pj 



I. INTRODUCTION 



Differently from their prokaryotic counterparts, the 
large majority of eukaryotic genes are split. The parts of 
the gene which carry the genetic code from which the pro- 
teins are synthesized, the exons, are interrupted by long 
stretches of "junk DNA", the introns Much is still 
uncertain about introns and in general about junk DNA. 
There is however a clear advantage for a gene of hosting 
introns: different mRNAs and henceforth different pro- 
teins can be synthesized from the same gene through a 
mechanism known as alternative splicing (see Fig. [1} . In 
different tissues of a multicellular organism the mRNAs 
are synthesized by placing the exons in different order 
or by skipping some of them. This produces quite sim- 
ilar, but not identical, proteins. Alternative splicing is 
responsible for the appearance of slightly different pro- 
teins, say, in brain and in liver both encoded by the same 
gene. 

The origin of introns has triggered quite some debate 
in the past years. The discussion was polarized into two 
different viewpoints: the "introns early" Q and the "in- 
trons late" [H theories. The introns late viewpoint states 
that introns came "late" in the evolution, say after the 
separation between eukaryotic and prokaryotic kingdoms. 
Ancient genomes, like nowadays bacteria, had no introns. 
During evolution introns were inserted at some positions 
in the coding sequence of eukaryotes. Bacteria did not 
get introns in order to keep their genome short. Accord- 
ing to the introns early perspective introns were already 
present in ancient genomes. In these genomes mini-genes 
were separated by junk DNA sequences. Complex genes 



appeared during evolution when the mini-genes were as- 
sembled together. 

Although the issue is not completely settled yet, there 
is a widespread agreement about the fact that most of 
introns were inserted late in the genome, except for few 
which can have a very old origin |4||. The question that 
remain unanswered is: through which mechanism introns 
were inserted into the genes? Did they target some spe- 
cific stretches of sequences or their insertion was a ran- 
dom process? 

In a previous paper Q we suggested that some introns 
may have targeted and got inserted in specific regions 
of the gene because of some physical stability proper- 
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FIG. 1: Through the mechanism of alternative splicing differ- 
ent RNAs and thus different proteins can be obtained from 
the same intron-containing DNA sequence. Typically, these 
different RNAs are synthesized in different tissues. In the 
example shown here three different RNAs are formed from 
the same DNA sequence: exons are colored, while introns are 
white, (a) Exons are assembled following the same order as 
in the DNA sequence, (b) Exon 2 is skipped, (c) Exons 2 and 
3 are in reversed order compared to the DNA sequence. 
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ties of these regions. DNA is an inhomogeneous poly- 
mer: sequences richer in CG nucleotides are more stable 
than AT rich regions since CG pairs form three hydro- 
gen bonds, while AT only two. Using a random set of 80 
human genes we found that there is a strong correlation 
between intron positions and stability boundaries to 
be defined more precisely in the next section. 

The aim of this paper is to investigate further on this 
issue. We consider here a single gene, the actin, and 
analyze its stability on animals, plants and fungi. Al- 
though originating from a single gene in a common an- 
cestor, actin genes have diversified during evolution. Sev- 
eral different actins are present in the genome of a given 
eukaryote. By analyzing the stability properties of genes 
belonging to a common family we gain insight on mech- 
anisms of introns insertion. We will show that com- 
mon stability boundaries are found in actin sequences 
of species belonging to different kingdoms. This implies 
that the boundaries observed in this and in previous work 
Q have a very remote origin, dating back to the develop- 
ment of early eukaryotes and supports previous observa- 
tions that stability boundaries may have influenced the 
insertion of at least some introns. An extensive discus- 
sion of the consequences of our findings is given in the 
final section of this paper. 



II. THERMODYNAMIC STABILITY 

When a double helical DNA in solution is brought to a 
sufficiently high temperature the two strands dissociate, 
or melt. DNA oligonucleotides of 20-30 base pairs melt 
at a single temperature. This temperature can be esti- 
mated using the nearest-neighbor model from which one 
computes Gibbs free energies, enthalpies and entropies 
of melting. Quite some effort has been dedicated in the 
past years to an accurate determination of these thermo- 
dynamic parameters (see e.g. @ and references therein) , 
due to importance of DNA melting and of the reverse 
transition, the DNA hybridization, in many biotechno- 
logical processes. The melting temperature depends, be- 
sides on the sequence composition, on salt concentration 
and pH of the solution. 

If the sequences are sufficiently long, DNA melting be- 
comes a multistep process 0j- Regions of the sequence 
which are GC-richer will melt at higher temperatures 
compared to GC-poorer regions. The interesting quanti- 
ties to calculate in this case are multiple partial melting 
temperatures and at the same time one needs to deter- 
mine the regions of the sequence which melt at those 
temperatures. In order to perform such type of calcu- 
lation various statistical mechanical models have been 
developed 0, H[ ■ The calculations presented in this pa- 
per are based on the Meltsim algorithm Q, in which a 
DNA configuration is approximated by a sequence of non- 
interacting loops and helical segments according to the 
Poland-Scheraga model [10, E|- ^ n this approach each 
base pair is in two possible states either open (0j = 0) 



or closed (6{ — 1), where 9 defines the order param- 
eter and i is an index running over all base pairs of 
a sequence (i — 1,2...N). In the Meltsim algorithm 
recursion relations I12fl and an approximation for the 
closed loops entropy [13j allow a rapid computation of 
the opening/closing probability at any given temperature 
for chains of several thousands base pairs. 

Computations based on Poland-Scheraga model have 
been quite popular in the past years 14], ID, [l6|, [l?], [H, 
ITqI [20| . Yeramian et al. analyzed the genomes of S. 
Cerevisiae (Yeast) [l4| and of P. Falciparum [l5j and 
identified genes on the basis of thermodynamic signals 
obtained from the melting analysis. The effect of mis- 
matches [l6| and of disorder [l9[ on DNA melting have 
also been discussed. A recent study has produced the 
melting map of the whole human genome [2l| . 

The other popular model for studies of the thermody- 
namics of the DNA is the Peyrard-Bishop model [1, H2] , 
which has attracted quite some attention in recent years 
0, HI [H, [H, [H This model is probably more 

accurate on shorter length scales as a configuration is 
identified by the distances between complementary bases 
and not by a simple boolean variable (9 = 0, 1) as in the 
Poland-Scheraga picture. However for the purposes of 
calculating stability properties which involve melting do- 
mains of about hundred basis the Poland-Scheraga model 
is good enough. Programs like Meltsim have been fine 
tuned to fit experimental data 0, Interestingly, 
the thermodynamic boundaries found in the Meltsim ap- 
proach in a previous paper [f| have also been found in an 
analysis of the Peyrard-Bishop model (26[. This shows 
that the properties discussed here are robust and model 
independent. 

In this paper we have used the same set of thermody- 
namic parameters as in Ref. [30j | . In order to estimate 
the thermal stability boundaries, we proceed as follows. 
Starting from sufficiently low temperatures in which the 
whole chain is in an helical state, we increase the tem- 
perature at a constant small step AT (= 0.01°C in the 
calculation) . At each point the configuration of the chain 
is calculated and the boundaries between helix and coil 
regions recorded. To discriminate between a helical and 
a coiled region we calculate the average value of 0$ at a 
given temperature and define the boundary as the point 
separating a 9 > 1/2 domain from a 9 < 1/2 domain. 

Typical outputs of the calculations are shown in the 
graphs of Fig. [2j [3J |4] and [6] In these graphs the x-axis is 
the temperature, while the y-axis represents the position 
along the sequence. For each gene we considered only the 
coding sequence (CDS) with all introns removed. This is 
the so-called complementary DNA (cDNA), which is a 
double stranded copy of the mRNA and can be obtained 
from it in laboratory through reverse transcription. For 
the purposes of inferring information on genome evolu- 
tion we can look at cDNA as an old gene before introns 
were inserted. In order to avoid boundary effects, i.e. 
dissociation dominated by the opening of forks at the 
edges, we have enclosed the sequences by two stretches 




FIG. 2: (Color online) Melting domains for H. Sapiens actins. GenBank entries: (a) NM_001101 (actin /3 ACTB), (b) 
NM_001100 (actin ai, skeletal muscle ACTA1), (c) NM.005159 (actin a, cardiac muscle, ACTC1) (d) NM.001613 (actin 
aa, ACTA2) (e) NM.001614 (actin 71, ACTG1) (f) NM_001615 (actin 72, ACTG2). The sequences have high similarity to 
those of all other vertebrates actins, for which almost identical melting patterns are found. Hence only human actins are shown 
as representatives to those of all the vertebrates. In the plots the temperature is in the :r-axis and the sequence position in 
the i/-axis. The thick solid lines separate the low temperature helix domain from the high temperature coiled state. Horizontal 
dashed lines indicate the boundaries of the CDS and the solid lines the intron positions for the given sequence. Arrows point 
to the intron positions found in homologous sequences. 



of poly(G) of two hundred nucleotides each (a poly(G) se- 
quence is a stretch of DNA composed only of nucleotides 
G, in this case the sequence referred to is double-stranded 
with one strand containing only G's while the other 
strand contains only C's). These stretches have high 
melting temperature, hence they dissociate well beyond 
the melting temperatures of the CDS. The solid thick 
lines in Figs. G2 H] and [5] separate the coiled from the 
helical regions (to the right and to the left of the curve, 
respectively). Due to strong cooperativity [3l[, the DNA 
melts through few sharp transitions involving the disso- 
ciation of hundred of base pairs simultaneously. Hence, 
only few stability domains are found in the analysis of a 
sequence of 1000 base pairs. 



III. ACTIN 

Actin proteins play a central role in eukaryotes. Actin 
filaments constitute the cytoskeleton of all eukaryotic 



cells, and are the site of interactions with many other 
proteins, as for instance motor proteins or actin-bundling 
proteins A mutation in a specific actin protein site 
may result in a change in its interactions with several 
proteins that bind near the mutated site. While the mu- 
tation can favor the interaction with one specific pro- 
tein, it is likely disrupt interactions with many other 
proteins. Hence, in order to maintain the multiple in- 
teractions with all its partners, actin proteins have been 
highly conserved during evolution. Obviously there is 
lower conservation at the gene level compared with the 
conservation of amino acid sequence for the correspond- 
ing protein, as the genetic code is degenerate and multiple 
codons encode for the same amino acid. For actin there is 
roughly a 80% of sequence conservation between human 
and yeast (S. Cerevisiae) genes, while 95% conservation 
of amino acids in the proteins [l| . 

Genomic evolution is believed to have occurred mainly 
through gene duplication and mutations [l[: At a given 
time an error in the replication of DNA produces two 
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TABLE I: Table of introns positions in Actin genes for vertebrates and green plants. The label refers to the codon position 
following the table of Ref. [32j. The three positions surrounded by a box are those for which a stability boundary was found. 
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FIG. 3: (Color online) Melting domains for actin genes of D. Melanogaster. The area descriptions are the same as in Fig. [2] 
(a,b) and C. Elegans (c). GenBank entries: (a) NM.169525 (Act87E), (b) NM.079643 (Act88F) and (c) NM.073416. 



copies of a gene, which are inherited to a daughter cell. 
These genes further evolve separately accumulating dif- 
ferent point mutations and thus diverging in time. As 
the process is repeated one obtains from a single ances- 
tor gene a family of closely related genes. In vertebrates 
there are three classes of actins |l| known as a, (3 and 
7 actins. The a actins are found in muscle cells, while 
the (3 and 7 are found in non- muscle cells. Plant actins 
also form a large family with even more genes than in 
vertebrates. For instance more than 10 different actin 
genes have been identified in the genome of the plant 
Arabidopsis Thaliana. 



A. Intron positions 

The Table U compares the intron positions of verte- 
brates and land plants. A more complete table which 
contains 56 different intron positions for actins of differ- 
ent organisms can be found in Ref. [32| . In total there 
are 7 intron positions for vertebrates actins. These po- 
sitions are labeled, following the notation of Ref. [33 |. 
by two numbers. The first number refers to the codon 
in the sequence and the second one (between 1 and 3) 
indicates where the intron is inserted in the codon. A 
3 signifies that the intron is inserted after the third nu- 
cleotide of the codon, hence the intron does not break 



the codon. The codon numbers are given with respect 
to a reference sequence, which is the a actin of verte- 
brates. Although plants have more actin isoforms than 
vertebrates, somewhat quite surprisingly they have only 
3 introns positions, one of which (152-1) is in common 
with the vertebrates lineage. 



B. Melting domains for animal actins 

We start with the description of melting domains in 
animal actins. Although Fig. [2] shows exclusively human 
actins, we found very similar melting profiles also in a, 
(3 and 7 actins of other vertebrates as: Canis familiaris 
(dog), Bos Taurus (cow), Danio Rerio (zebrafish), Cal- 
lus Gallus (chicken) etc. . . Hence, the conclusions drawn 
from the analysis of Fig. [5] are probably valid for all 
vertebrates. 

Figure HJa) shows the melting behavior of the human 
actin (3 (GenBank entry NM_001101). This sequence has 
four introns at positions 43-3, 123-3, 270-1 and 330-3 (see 
Tablefl]) . These positions are indicated by horizontal lines 
in Fig. HJa) . In this sequence melting is a three state pro- 
cess. First, at around 85°C the exon bounded between 
introns 270-1 and 330-3 melts. Next the whole CDS se- 
quence melts except for a short fragment bounded by the 
intron at 43-3, which then melts only beyond 90°C. There 
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is a remarkable correspondence between the 43-3 position 
and a sharp stability boundary. Also positions 270-1 and 
330-3 show a similar, although weaker, correspondence. 
This (3 actin sequence has already been analyzed in Ref. 
Q (see Fig. 2 in [5]). In that analysis the correspon- 
dence with the intron 330-3 was missed because different 
boundary conditions were used. In this work the CDS is 
embedded between two poly(G) stretches, so that melt- 
ing inside the sequence is always through the formation 
of loops bounded between two helical regions. In Ref. 
Q some parts of the untranslated regions bounding the 
CDS were included in the analysis. As no stable bound- 
ary helical regions were included, part of the melting in 
Ref. [B[ occurred through fork openings from the bound- 
aries. The inclusion of untranslated regions, i.e. of the 
original genomic neighborhood in the analysis, is prob- 
ably not an optimal choice as these regions are poorly 
conserved during evolution. As the aim is to look for 
signals from ancient genomes it is better to embed the 
CDS between two poly(G) stretches. In this way all the 
sequences analyzed are treated on equal footing. 

The Figs. [2Ib-f) show other melting domains for ho- 
mologous vertebrates actin genes. These sequences have 
4 (e), 5 (b,c) and 7 (d,f) intron positions. There are 
three introns positions which are in common in the ver- 
tebrates actins: 43-3, 270-1 and 330-3. These are also 
those for which a correspondence with thermal bound- 
ary was found in the (3 gene of Fig. [2] (a). The corre- 
spondence between thermodynamic boundaries with the 
introns positions 43-3 and 270-1 is also observed in the 
three other sequences of Fig. [5] (b,c,f). Note the very 
sharp signal from the 43-3 intron in the case (c). The in- 
tron at position 330-3 shows a good correspondence with 
stability boundaries for the sequences in Fig. [2] (b,c). A 
much weaker, but noticeable, correspondence is with the 
intron at position 206-1 in sequences (c) and (d). The 
stability boundary is slightly shifted from the 206-1 in 
sequences (d) and (f). The correspondence between in- 
tron positions and thermodynamic boundaries is absent 
in the sequence (d). 

Another interesting feature of vertebrates actins can 
be seen in Fig. [2fb). This sequence is the actin ct\ which 
hosts 5 introns in its coding region. These are marked 
by horizontal lines. The two remaining of the total 7 in- 
tron positions of vertebrates actins, the 86-3 and 123-3, 
are indicated by horizontal lines. As it can be seen from 
Fig. [2jb), these two positions correspond to stability 
boundaries. The correspondence of a stability boundary 
with the 123-3 is also visible in (c) and (e). In the latter 
example the 123-3 is a nucleation site for a small loop. 
The sequences of Fig. [2] (d), (e) and (f) show a much 
weaker correspondence between intron positions and sta- 
bility boundaries. 

Fig. shows the melting curves for Drosophila 
Melanogaster (a,b), the fruit fly and Caenorhabditis El- 
egans (c), a worm. The Drosophila actins have at most 
one intron in the coding region either in 15-1 or 310-1. 
These positions differ from the vertebrates positions dis- 



cussed so far. The sequence shown in Fig. [3] (a) has no 
introns. The melting analysis however reveals few sta- 
bility boundaries close to the positions 43-3, 86-3, 270-1 
and 330-3, which are the introns position of vertebrates 
actins. The 43-3 and 86-3 are particularly sharp. The 
next Drosophila sequence (b) with one intron at posi- 
tion 310-1 show a stability boundary close to 270-1 and 
a weaker one close to 330-3. Compared to the case in 
(a), in this sequence the signals from 43-3 and 86-3 have 
been lost. However a sharp boundary has appeared close 
to the vertebrates intron 123-3. The C. Elegans sequence 
of Fig. [3Jc) has two introns at "new" positions 65-1 and 
325-2. As in the previous examples one observes bound- 
aries close to the 43-3, 123-3, 270-1 and 330-3 positions. 



C. Melting domains for plant actins 

Figure 2] shows the melting domains for actin genes of 
the green plant Arabidopsis Thaliana. The introns posi- 
tions of actins sequences of higher plants are highly con- 
served (see Table HI, which indicates that these introns 
date back to the early evolution of land plants. In 3 out 
of the 9 Arabidopsis sequences shown Fig. [4] (a, d, f) 
we find a correspondence of a thermal boundary and an 
intron at 152-1. This is the intron which is in common 
with vertebrates (see Table HJ. As in the Drosophila and 
C. Elegans sequences (Fig. [3]) in general stability bound- 
aries tend to be found at vertebrates positions 43-3, 86-3, 
270-1 and 330-3. In few cases the correspondence is very 
striking, as in Fig. [U(d). 

In order to corroborate these findings we extended 
the analysis to other plants. We considered 12 addi- 
tional actins from Nicotiana Tabacum (tobacco, Gen- 
Bank X63603), Oryza Sativa (rice, GenBank X15862, 
X15863, X15864, X15865), Glycine Max (soyabean, Gen- 
Bank J01298, V00450), Solanum Tuberosum (potato, 
GenBank X55749, X55750, X55751, X55752) and Striga 
Asiatica (GenBank U68461, U68462). With the 9 se- 
quences from A. Thaliana we have in total 21 plant actin 
genes. For each sequence the melting curves were cal- 
culated and then averaged. The result is shown in Fig. 
[5] In the graph the x and y axes are reversed compared 
to the Figures |H The £-axis is the codon position, and 
the temperature, now in the y axis, is ordered as increas- 
ing from top to bottom. As reference four of the most 
commonly found introns positions for actin genes are in- 
dicated as vertical lines. The averaging introduces some 
smoothening, but it confirms the existence of a sharp sta- 
bility boundary close to the 43-3 position. Two weaker 
boundaries are found close to the positions 86-3 and 270- 
1. 



D. Melting domains for fungi actins 

To conclude the analysis of the stability behavior of 
actin genes we consider now fungi. Fig. [5] shows the melt- 
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FIG. 4: (Color online) Melting domains for actin sequences of the A. Thaliana. The area descriptions are the same as in Fig. [2] 
GenBank entries: (a) M20016 (AAcl), (b) NM.180280 (ACT2), (c) U39480 (ACT3), (d) U27980 (ACT4,) (e) U27811 (ACT7,) 
(f) U42007 (ACT8), (g) U27981 (ACT11), (h) U27982 (ACT12) and (i) U37281 (actin 2). 



ing curves for the budding yeast Saccharomyces Cere- 
visiae (a), for Neurospora Crassa (b) and for Candida 
Albicans (c). In general the number of introns and their 
positions are highly variable in fungi actin genes: their 
number vary from to 7 and the positions are most likely 
concentrated in the region before the 50th codon. This 
can be seen also in the sequences of Fig. [6] two of which 
have one intron and one has four. All introns are found 
before the codon 45. The melting behavior shown in Fig. 



IH1 resembles that of the previous cases. A sharp stabil- 
ity boundary close to the 43-3 position and some weaker 
ones appearing close to positions 270-1 and 330-3 in the 
case (b). This correlation is absent in the case (a). 
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Codon position 

FIG. 5: (Color online) Average melting curves from 21 green 
plant actin genes. The axes of the diagram are swapped com- 
pared to those in Figs. [2]-|4] The horizontal axis is given in 
codon position; only the coding sequence is shown. The four 
vertical lines denote the major four introns positions found to 
correlate with stability boundaries in vertebrates. 



E. A "hotspot" for introns insertion 

In Ref. [32( the full table of introns positions in actin 
genes gives 56 different positions. There is an interesting 
remark concerning the position 43-3, which is shared by 
all vertebrates actins. Position 43-3 is relatively common 
in animals, but it is also the sole intron in the actin gene 
of the red alga Chondrus crispus [32j . Shifted of a single 
nucleotide at position 44-1 there is an intron in the alga 
Cyanophora Paradoxa [321 ] . An intron at position 44- 
2 is present in the single-copy actin genes of the fungi 
Thermomyces lanuginosa, Aspergillus Niger, Neurospora 
Crassa (see Fig.HJb)) and Trichoderma Reesei [32]. The 
only other positions with introns separated by a single 
nucleotide are at 34-1 and 34-2, which however are only 
found in some Fungi. Hence the 43-3 is a unique site in 
the actin genes, which we can refer to as an "hotspot" 
for introns insertion. 



IV. DISCUSSION 

DNA sequences, which are hundred of base pairs long, 
tend to melt through a series of separate temperature 
steps. Each step consists of the melting of a region of 
few hundred of base pairs. By following the melting 
process over a wide temperature interval one can thus 
identify separate melting domains, i.e. parts of the se- 
quence which dissociate at different temperatures. The 
domain boundaries are points in which the sequence tend 
to form in a relatively wide temperature interval a stable 
Y-conformation separating a double helix from a coiled 
region (see Fig. [7]). 

A previous study [|[ of about 80 human genes from 
which introns are removed and exons linked together re- 
vealed that stability domain boundaries tend to be lo- 
calized at the end of exons. This correspondence was 



found for about 35% of the exons analyzed. The corre- 
lation was found to be stronger for a class of so-called 
housekeeping genes, i.e. those genes involved in the ba- 
sic cellular processes. These genes are expressed in all 
tissues and have been more conserved during evolution. 
Actin is in fact an example of a housekeeping gene. If one 
accepts an "introns late" viewpoint, the correlation be- 
tween intron positions and stability boundaries suggests 
that some introns were inserted into genes at the ends 
of the melting domains in a process driven by thermo- 
dynamics. Such a process is illustrated in Fig. 01): an 
intronless fragment of a gene has naturally parts which 
are richer and poorer of CG nucleotides. When the two 
strands partially separate they may form a Y configura- 
tion the end of a less-stable domain and the beginning of 
a more stable one. Introns may have targeted these fork 
locations. 

Another possibility that may have explained the cor- 
relation between thermodynamic boundaries and introns 
positions observed in Ref. Q is schematically shown in 
Fig- Hill)- Originally the insertion site does not pos- 
sess a thermodynamic boundary, so the intron is inserted 
through a process which does not depend on thermo- 
dynamics. Once the insertion has taken place and the 
two exons are separated by an intron stretch. Muta- 
tions may have biased the CG content on the two exons 
so that their thermodynamic boundary originated after 
the intron insertion. However, this scheme is at odds 
with the results presented in this paper. We have indeed 
shown that boundaries in conserved positions are found 
in actin family genes where no introns are present close 
to those positions. For instance, in many actins of plants, 
fungi and animals there is a sharp stability boundary at 
the position 43-3 in sequences which have no intron at 
that position. Hence being found in plants, animals and 
fungi sequences, the stability boundary at 43-3 is rather 
a property of an intronless ancestor actin gene. The same 
is true for stability boundaries found in other intronless 
positions as for instance the 86-3 and 270-1. 

We further speculate on the "introns early" perspec- 
tive, i.e. the possibility that introns were already present 
in early genomes and were selectively lost by some 
species. Our findings then imply that these introns would 
have separated early exons (or mini-genes as they are also 
referred to) with different stability properties, as shown 
in the scheme of Fig. [7)^111). Although possible, this sce- 
nario seems to be in contradiction with most of the recent 
phylogenetics based studies which favor an introns late 
theory. 

In conclusion, our work supports a mechanism given in 
Fig. W(X)i i-e. a thermodynamic driven introns insertion. 
This does not necessarily mean that the actual insertion 
process took place through an equilibrium transition with 
a temperature rise to 80° C. First of all the melting tem- 
perature depends also on other salt concentration and pH 
of the environment. Moreover, the boundaries found in 
the melting analysis, should manifest themselves also un- 
der nonequilibrium conditions. Y-configurations as those 
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FIG. 6: (Color online) Melting domains for actin sequences of Fungi. The area descriptions are the same as in Fig. [2] (a) 
Saccharomyces Cerevisiae (yeast) (b) Neurospora Crassa and (c) Candida Albicans. GenBank entries: (a) X61502 (Act2), (b) 
U78026 and (c) X16377 (actl). 
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FIG. 7: Possible scenarios of gene evolutions. (I), (II) introns late and (III) introns early perspectives. The scheme (I) is the 
insertion driven by thermodynamics, supported by the analysis of stability regions reported in this paper. The scheme (II) 
suggests the appearance of a thermal boundary after intron insertion, a scenario not supported by the results presented in this 
paper. Finally the scheme (III) takes into account the introns early perspective in which the ancient exons had already different 
stability properties. 



shown in Fig. [7|can also be generated by mechanical un- 
zipping of DNA [33[ . Quite remarkable is the fact that 
the sharpest boundary in actin genes (43-3) is also the 
locus in which introns insertion has been the most active 
in evolutionary distant organisms. As we have pointed 
out introns have also been found at positions 44-1 and 44- 
2. This fact is in agreement with an idea of an insertion 
driven by thermodynamics. As the boundary is particu- 
larly sharp the mechanism of Fig. [7^1) can have occurred 
independently on three different families of actin genes. 

The correlation between stability boundaries and in- 
trons position in particularly sharp in several sequences 
analyzed, but in few cases is absent. We believe that 
this is due to mutations having erased the correlation 
from the ancestral gene sequence. Although actin is 



highly conserved as a protein, there is no selective pres- 
sure against synonymous mutations which do not modify 
the aminiacid sequence. Such mutations are known to 
have occurred at a roughly constant rate in all genes of a 
given organisms [l] . Two genes of the same family in the 
same organism have evolved separately and mutations 
may have accumulated at higher/lower rates in different 
parts of the sequence: in some genes the mutations may 
have erased the ancient stability boundaries. 

The problem of introns evolution has been widely de- 
bated in the biological literature (for a recent review of 
the state of the art see Ref. |34[). Even within the in- 
trons late perspective there is no general consensus on 
the mechanism of insertion and several possibilites have 
been analyzed. For instance Ref. [34| reports 5 differ- 
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ent models of introns insertion. Most of these models in 
general discuss the mechanism of insertion without sug- 
gesting in which position of the sequence the insertion 
would have occurred. One exception is the protosplicc 
site model [35[ which suggests a bias towards a specific 
insertion sequence (C/A)AG(G/A) (here C/A denotes a 
site which can posses either a nucleotide C or A), referred 
to as the protosplice sites. This insertion would have lead 
to a structure (C/A)AG-intron-(G/A). The protosplice 
model and other models for introns insertions are only 
partially supported by the analysis of genomic data [3J] . 
Unfortunately, the genomes nowadays investigated 



have been heavily reshaped by hundreds of millions of 
years of evolution and are quite different from genomes 
of early eucaryotes. Hence the answer to the question of 
introns origin is not an easy one. Indeed although introns 
were discovered 30 years ago, there still an open debate 
on this issue. Moreover, evolution may have taken place 
through complex and diversified pathways so it is not 
unlikely that different mechanisms of insertion have co- 
existed. Certainly the possibility that also the physical 
and thermodynamical stability of the double helix has 
played a role offers new insights and stimulates further 
research in this field. 
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